9 research outputs found

    A medication extraction framework for electronic health records

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-76).This thesis addresses the problem of concept and relation extraction in medical documents. We present a medical concept and relation extraction system (medNERR) that incorporates hand-built rules and constrained conditional models. We focus on two concept types (i.e., medications and medical conditions) and the pairwise administered-for relation between these two concepts. For medication extraction, we design a rule-based baseline medNERRgreedy med that identifies medications using the UMLS dictionary. We enhance medNERRgreedy med with information from topic models and additional corpus-derived heuristics, and show that the final medication extraction system outperforms the baseline and improves on state-of-the-art systems. For medical conditions extraction we design a Hidden Markov Model with conditional constraints. The conditional constraints frame world knowledge into a probabilistic model and help support model decisions. We approach relation extraction as a sequence labeling task, where we label the context between the medications and the medical concepts that are involved in an administered-for relation. We use a Hidden Markov Model with conditional constraints for labeling the relation context. We show that the relation extraction system outperforms current state of the art systems and that its main advantage comes from the incorporation of domain knowledge through conditional constraints. We compare our sequence labeling approach for relation extraction to a classification approach and show that our approach improves final system performance.by Andreea Bodnari.S.M

    Joint multilingual learning for coreference resolution

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.98Cataloged from PDF version of thesis.Includes bibliographical references (pages 112-120).Natural language is a pervasive human skill not yet fully achievable by automated computing systems. The main challenge is understanding how to computationally model both the depth and the breadth of natural languages. In this thesis, I present two probabilistic models that systematically model both the depth and the breadth of natural languages for two different linguistic tasks: syntactic parsing and joint learning of named entity recognition and coreference resolution. The syntactic parsing model outperforms current state-of-the-art models by discovering linguistic information shared across languages at the granular level of a sentence. The coreference resolution system is one of the first attempts at joint multilingual modeling of named entity recognition and coreference resolution with limited linguistic resources. It performs second best on three out of four languages when compared to state-of-the-art systems built with rich linguistic resources. I show that we can simultaneously model both the depth and the breadth of natural languages using the underlying linguistic structure shared across languages.by Andreea Bodnari.Ph. D

    In-hospital mortality for liver resection for metastases: a simple risk score

    No full text
    BACKGROUND: Surgical management of liver metastases from various primaries is increasingly common. The mortality of such procedures is not well-defined. Accurate predictions for perioperative risk could augment decision-making. MATERIALS AND METHODS: The Nationwide Inpatient Sample was queried (1998-2005) for patient-discharges for hepatic procedures for metastases. Logistic regression and bootstrap methods were used to create an integer score for estimating the risk of in-hospital mortality using patient demographics, comorbidities, procedure, and hospital type. A randomly selected sample of 80% of the cohort was used to create the risk score, with validation of the score in the remaining 20%. RESULTS: For the total 50,537 patient-discharges, overall in-hospital mortality was 2.6%. Factors included in the model were age, sex, Charlson comorbidity score, procedure type, and teaching hospital status. Integer values were assigned for calculating an additive score. Four score groups were assembled to stratify risk, with a 15-fold gradient of mortality ranging from 0.9% to 14.7% (P\u3c0.0001). In the derivation and the validation set, the score discriminated well, with a c-statistic of 0.72 and 0.72, respectively. CONCLUSION: An integer-based risk score can be used to predict in-hospital mortality after hepatic procedure for metastases, and may be useful for preoperative patient counseling

    Predicting major complications after laparoscopic cholecystectomy: a simple risk score

    No full text
    INTRODUCTION: Reported morbidity varies widely for laparoscopic cholecystectomy (LC). A reliable method to determine complication risk may be useful to optimize care. We developed an integer-based risk score to determine the likelihood of major complications following LC. METHODS: Using the Nationwide Inpatient Sample 1998-2006, patient discharges for LC were identified. Using previously validated methods, major complications were assessed. Preoperative covariates including patient demographics, disease characteristics, and hospital factors were used in logistic regression/bootstrap analyses to generate an integer score predicting postoperative complication rates. A randomly selected 80% was used to create the risk score, with validation in the remaining 20%. RESULTS: Patient discharges (561,923) were identified with an overall complication rate of 6.5%. Predictive characteristics included: age, sex, Charlson comorbidity score, biliary tract inflammation, hospital teaching status, and admission type. Integer values were assigned and used to calculate an additive score. Three groups stratifying risk were assembled, with a fourfold gradient for complications ranging from 3.2% to 13.5%. The score discriminated well in both derivation and validation sets (c-statistic of 0.7). CONCLUSION: An integer-based risk score can be used to predict complications following LC and may assist in preoperative risk stratification and patient counseling
    corecore